FIX Support errors in `MultiPromptSendingAttack`, add safe completion support to `SelfAskRefusalScorer` by fdubut · Pull Request #1366 · Azure/PyRIT

fdubut · 2026-02-12T00:52:52Z

Description

A couple of fixes:

Support content moderation errors in MultiPromptSendingAttack. Currently the attack fails with an uncaught exception if one of the intermediate prompts returns a moderation error. With the fix, it will fail gracefully.
Support safe completions in SelfAskRefusalScorer. Currently, the scorer will lean towards "not a refusal" if the model returns a safe completion (which most modern models post GPT-5 will do). With the added option, safe completions are considered a refusal. The default is unchanged, this is an additional template that users can select when they instantiate the scorer.

Tests and Documentation

Added one test to verify SelfAskRefusalScorer throws an exception when no objective is provided and safe completions are disallowed (the scorer needs to know the objective to assess whether this was a "safe" completion or a "true" completion).

pyrit/datasets/score/refusal/refusal_no_safe_completions.yaml

pyrit/score/true_false/self_ask_refusal_scorer.py

fdubut added 3 commits February 11, 2026 10:48

Add support for safe completions in refusal scorer

d5fc459

Fix blocked/error handling of MultiPromptSendingAttack

86e5985

Merge branch 'main' of https://github.com/fdubut/PyRIT into bug_fixes

5bfe81b

rlundeen2 reviewed Feb 17, 2026

View reviewed changes

pyrit/datasets/score/refusal/refusal_no_safe_completions.yaml Outdated Show resolved Hide resolved

rlundeen2 reviewed Feb 17, 2026

View reviewed changes

pyrit/score/true_false/self_ask_refusal_scorer.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Feb 17, 2026

View reviewed changes

pyrit/score/true_false/self_ask_refusal_scorer.py Outdated Show resolved Hide resolved

rlundeen2 reviewed Feb 17, 2026

View reviewed changes

pyrit/score/true_false/self_ask_refusal_scorer.py Show resolved Hide resolved

rlundeen2 assigned jsong468 Feb 17, 2026

jsong468 added 9 commits February 17, 2026 12:49

Merge branch 'main' into bug_fixes

84d500a

PR feedback

77b26a6

docstring

4d702f0

Merge branch 'main' into bug_fixes

5a681ee

api.rst update

44fbbe3

clean up yamls

ba50480

clean up yamls2

de16089

sphinx error

fce2866

sphinx error2

806c576

jsong468 approved these changes Feb 18, 2026

View reviewed changes

jsong468 merged commit 8f923e2 into Azure:main Feb 18, 2026
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

FIX Support errors in `MultiPromptSendingAttack`, add safe completion support to `SelfAskRefusalScorer`#1366

FIX Support errors in `MultiPromptSendingAttack`, add safe completion support to `SelfAskRefusalScorer`#1366
jsong468 merged 12 commits intoAzure:mainfrom
fdubut:bug_fixes

fdubut commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

fdubut commented Feb 12, 2026

Description

Tests and Documentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants